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ABSTRACT 

During  2006-7,  Adl4a  was  identified  during  a  series  of 
FRI  outbreaks  across  the  US,  involving  at  least  ten 
documented  pneumonia  fatalities.  Leveraging  sequence  data 
from  the  prototype  strain  Adl4p  (GenBank  #  AY803294),  the 
full  genome  sequence  of  Ad  14a  was  determined  using  the 
conventional,  and  very  labor-intensive,  Sanger  sequencing 
method.  The  same  genome  was  analyzed  using 
Pyrosequencing,  an  emerging  alternative  genome  sequencing 
technology  offering  much  higher  efficiency.  This  direct 
shotgun  approach  relies  on  random  sequencing  of  small  DNA 
fragments  using  adaptor  sequences,  rather  than  independent 
amplification  of  separate  fragments  using  pre-determined 
pathogen-specific  sequences.  This  new  sequencing  strategy  is 
therefore  ideally  suited  for  the  rapid  sequencing  of  hitherto 
uncharacterized  human  pathogens.  The  Roche  454  FLX 
system  was  used  to  sequence  and  assemble  multiple  Ad  14a 
viruses  from  recent  US  outbreaks,  as  well  as  closely  related 
Adi  la  isolates  causing  non-US  ARD  infections  since  the 
1970s.  The  US  Adl4a  strain  significantly  diverges  from  the 


prototypical  Eurasian  strain,  Adl4p,  and  shares  greater  than 
98%  genomic  homology  with  Adi  la.  Two  genome  types  of 
Adll,  Adllp  and  Adi  la  display  different  tissue  tropisms, 
causing  renal  and  upper  respiratory  infections  respectively. 
Adl4a  and  Adi  la  share  almost  identical  Fiber  genes,  which 
are  known  to  be  responsible  for  the  adenoviruses'  organ 
tropism,  and  both  cause  ARD  infections.  Both  also  share 
highly  homologous  Hexon  genes,  except  for  a  400  base  pair 
(bps)  region  that  allows  these  two  viruses  to  be  distinctly 
differentiated  from  each  other  based  on  serological  cross 
reactivity.  The  origin  of  the  emergent  Ad  14a  could  be  related 
to  recombination  events  that  have  shuffled  the  tissue  tropism 
and  antigen  loci  of  ancestral  Adll  and  Ad  14  strains.  High 
throughput  sequencing  is  a  powerful  tool  for  rapid  analysis  of 
emerging  pathogens,  and  can  be  used  to  generate  comparative 
data  offering  information  regarding  the  genome-wide 
relationship  of  those  pathogens  with  well-characterized 
relatives. 

INTRODUCTION 

Human  adenoviruses  cause  respiratory  infections 
with  symptoms  ranging  from,  commonly,  febrile  respiratory 
illness  (FRI)  to,  more  rarely,  pneumonia  and  death.  In  civilian 
populations  human  adenovirus  infections  occur  sporadically 
in  local  or  national  epidemics.  Epidemics  are  often  associated 
with  the  emergence  of  new  variants  (genome  types)  of 
otherwise  common  serotypes,  such  as  Ad4,  or  the  recent 
emergence  of  rare  serotypes.  Adenoviruses  of  three  species, 
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AdB,  AdC  and  AdE,  are  frequently  associated  with 
respiratory  disease  in  mostly  young  and  healthy  population. 
AdC  serotypes  include  Adi,  Ad2,  Ad5  and  Ad6,  and  are 
endemic  among  children  and  young  adults.  Pre-existing 
antibodies  against  AdC  are  extremely  prevalent  among 
general  population.  Thus,  many  adults  are  immune  to  these 
serotypes.  The  sole  AdE  serotype,  Ad4,  and  several  other 
AdB  serotypes  including  Ad3,  Ad7  and  Ad21,  are  commonly 
associated  with  epidemic  outbreaks  of  FRi  and  pneumonia  in 
healthy  adults  and  children  throughout  the  world.  Two 
closely  related  AdB  serotypes,  Adi  la  and  Adl4,  had  until 
recently  only  been  identified  in  association  with  respiratory 
disease  epidemics  in  rare  (though  severe)  outbreaks  in 
Eurasia  and  southeastern  Asia  region.  Prior  to  2006-7  North 
America  outbreaks,  Ad  14  had  never  been  isolated  from  any 
case  in  North  America. 

During  2006  and  2007,  Ad  14  was  identified  in  a 
series  of  outbreaks  across  the  United  States,  in  association 
with  widespread  FRI  and  at  least  ten  documented  pneumonia 
fatalities.  This  phenomenon  was  simultaneously  tracked  in 
both  civilian  populations,  by  the  CDC  and  local  public  health 
agencies,  and  in  military  recruits,  by  US  Department  of 
Defense  public  health  agencies.  Detailed  comparisons  of 
collected  strains,  including  sequencing  of  fiber  and  hexon 
genes  and  whole-genome  restriction  analysis  (genome 
typing),  revealed  that  all  of  these  events  were  caused  by  a 
single,  apparently  homogeneous  strain.  The  identified  strain 
was  significantly  diverged  from  the  prototypical  Eurasian 
strain,  Adl4p  identified  in  the  mid  1900s.  Full  genome 


sequencing  of  multiple  Ad  14a  isolates,  including  both  a  fatal 
pneumonia  isolate  from  a  severe  outbreak  and  an  isolate  from 
a  mild  outbreak  that  had  little  effect  on  local  adenoviral 
illness  rates,  revealed  only  two  genetic  polymorphisms 
between  the  two  strains,  one  of  which  was  a  synonymous 
base  mutation  in  the  fiber  gene  (FIS  Floung,  unpublished 
data).  This  high  degree  of  homogeneity  (clonality)  did  not 
offer  a  simple  way,  such  as  pathogen  or  genotype  specific 
PCR  to  track  different  lineages  of  the  emerging  viruses  that 
have  significant  impact  on  US  military  personnel.  Scientists 
at  Walter  Reed  Army  Institute  of  Research  illustrated  the 
utilities  of  recently  developed  high  throughput  “Next- 
Generation”  Pyrosequencing  Technique  for  rapid  genome 
analyses  of  emergent  human  Adenovirus  14a  Causing  2006-7 
Febrile  Respiratory  Illness  (FRI)  Outbreaks  in  the  US.  “Next- 
Generation”  Pyrosequencing  Technique  offers  several 
advantages  for  full  viral  genomic  studies.  It  eliminates  the 
requirement  for  the  target  specific  PCR/sequencing  primers 
during  PCR  amplification  and  sequencing  processes.  And  it 
also  extends  the  US  DoD’s  capacity  in  identifying  and 
detecting  the  future  emergent  pathogens  with  no  known 
reference  sequence  available. 

MATERIALS  &  METHODS 

Sample  collection.  All  Adl4  isolates  of  2006-7  outbreaks 
from  basic  training  facilities,  aside  from  Lackland,  were 
collected,  identified,  and  analyzed  as  part  of  the  Naval  Health 
Research  Center  (NHRC)'s  ongoing  population-based  FRI 


surveillance  program.  Ad  14s  of  Lackland  origin  were 
comprised  of  NHRC  surveillance  samples  and  samples 
collected  from  severely  ill  patients  at  Wilford  Hall  Medical 
Center  located  at  Lackland.  Those  samples  and  samples  from 
advanced  Air  Force  training  centers  were  sent  to  the  Air 
Force  Institute  of  Occupational  Health  (AFIOH)  for 
diagnostic  viral  culture  and  were  provided  to  NHRC  as  de- 
identified  isolates.  All  samples  collected  by  NHRC  were 
provided  for  research  use  under  informed  consent  and 
internal  review  board-approved  human  use  protocols 
(Protocol  #  NHRC. 2005. 00 17). 

NHRC  samples  were  collected  as  oropharyngeal 
(throat)  swabs  in  VTM  (Remel,  Lenexa,  KS),  immediately 
frozen  in  either  -80  freezers  or  in  dry  ice,  and  transported  on 
dry  ice  to  NHRC  under  CAP-accredited  collection  and 
transport  protocols.  AFIOH  samples  were  collected  as  throat 
swabs  in  VTM,  cultured  in  A549  cells  and  transported  to 
NHRC  as  above.  All  of  the  above  samples  were  tested  at 
NHRC  for  Adl4  (see  PCR  and  sequencing  methods  below) 
as  raw  specimens,  then  subsequently  cultured  in  A549  cells 
(Diagnostic  Hybrids,  Athens,  OH)  and  stored  frozen  as 
infected  tissue  culture  fluid  (isolated  virus).  Sequencing  work 
on  these  samples  was  performed  on  the  resulting  isolates. 
Following  PCR  identification  as  Adl4,  chosen  samples  were 
extracted  and  aliquotted  at  NHRC  and  transported  frozen  on 
dry  ice  to  Walter  Reed  Army  Institute  of  Research  (WRAIR) 
for  sequence  analysis. 

Adl4-specific  PCR  amplifications  and  Conventional 
Sanger-DNA  sequencing  of  Adl4s.  PCR  and  Sequence 


analysis  was  accomplished,  using  the  methods  detailed  in  the 
next  paragraph,  at  the  CLIA/CLIP  accredited  virology  facility 
at  Walter  Reed  Army  Institute  of  Research.  PCR  primer 
pairs  designed  and  used  to  generate  overlapped  1-2  kilobase 
(kb)  amplicons  to  cover  the  entire  genomes  of  various  Ad  14 
isolates  were  derived  from  Adl4  deWit  prototype  sequence 
(GenBank  Accession  AY803294).  All  PCR  products  were 
sequenced  in  both  directions  by  using  forward  or  reverse 
PCR  primers  corresponding  to  each  individual  PCR  product. 
All  clean  and  verified  readable  sequences  were  used  to 
assemble  full  Ad  14  genome  sequence  via  using  Sequencer 
program  (Gene  Codes  Co.,  Ann  Arbor,  MI). 

200  pi  aliquots  of  Ad  14  samples  were  extracted 
using  the  Invitrogen  Charge-Switch  DNA  extraction  kit 
(Invitrogen  Inc.,  CA)  per  the  manufacturer's  instructions,  and 
eluted  into  200  pi  of  buffer.  100  pi  PCR  amplification 
reactions  consisted  of  2mM  MgC12,  0.6mM  dNTP  (1.5mM 
each  A,  C,  T  and  G),  200 DM  each  primer  (see  previous 
paragraph),  2.5  units  of  Platinum  Taq  Polymerase 
(Invitrogen,  Carlsbad,  CA),  and  lul  of  extracted  sample  in  IX 
ABI  Buffer  II  (Applied  Biosystems,  Foster  City,  CA). 
Thermal  cycling  was  carried  out  on  an  ABI9700  platform 
(Applied  Biosystems)  using  the  following  parameters:  Initial 
activation  for  2min  at  94°C,  then  35  cycles  of:  20sec  at  94°C, 
20sec  at  53°C,  and  2min  at  72°C.  Final  extension  was  for 
7min  at  72°C.  PCR  cleanup  was  performed  using  the  Qiagen 
PCR  Cleanup  Kit  (Qiagen)  per  the  manufacturer's 
instructions.  Sequencing  reactions  were  set  up  per  the 
manufacturer's  instructions  using  the  ABI  Big  Dye 


Terminator  Kit  (manual  version  3.2,  Applied  Biosystems), 
and  run  on  an  AB19700  platform.  Reaction  products  were 
analyzed  on  an  ABI3130XL  automated  sequencer  (Applied 
Biosystems)  per  the  manufacturer's  instructions.  Resulting 
data  was  then  edited  and  aligned  using  Mac  Sequencer 
software  (Gene  Codes  Inc,  Ann  Arbor,  MI). 

Full  Adl4  genome  sequencing  via  “Next-Generation” 
pyrosequencing  sequencing.  GS  DNA  Library  Preparation 
Kit  (Roche  454  Life  Science,  Branford,  CT)  was  used  to 
process  Ad  14  DNA  sample,  1-2  pg  per  virus  into  a  library  of 
single-stranded  template  DNA  fragments  (sstDNA).  Such  a 
library  can  then  be  used  as  input  to  the  GS  emPCR 
amplification  (Shotgun),  and  further  sequenced  in  the 
Genome  Sequencer  System,  developed  by  454  Life  Sciences 
Corporation.  The  sequencing  reads  obtained  from  Roche  454 
FLX  sequencers  of  each  individual  virus  were  used  assemble 
into  a  single  contig  representing  as  a  full  and  complete  Ad  14 
genome  sequence. 

Bioinformatic  analysis  of  assembled  Adl4  genomes.  Full 

Ad  14  genome  sequences  obtained  from  this  study  derived 
from  both  conventional  Sanger-sequencing  and 
Pyrosequencing  were  compared  using  Clustal  W  alignment 
program  (DNAStart,  WI). 

Results  &  Discussions 

Assembling  full  Adl4  genomic  sequences  of  2006-7  US 
outbreaks  via  Adl4-specific  PCR  amplifications  followed 
by  conventional  Sanger  DNA  sequencing  method.  We 


demonstrated  the  utilities  of  the  methods  applied  to  various 
full  Ad  14  genomic  sequencing  derived  from  the  recent  2006- 
7  North  America  outbreaks,  including  a  paired  Adl4s 
isolated  from  severe  (Ad  14  LL1986T)  and  mild  (Ad  14 
LL303600)  ARD  infected  patients  from  Lackland,  Texas. 
Another  Ad  14  isolated  from  mild  ARD  patient  of  San  Diego 
origin  ,  Ad  14  NHRC  22039  was  also  employed  to  obtain  the 
full  genome  sequence  in  this  study.  At  present,  almost  all  of 
GenBank  Ad  viral  DNA  data  were  derived  from  either  PCR 
amplicons  or  recombinant  cloned  DNA  fragments  via 
conventional  Sanger  DNA  sequencing  method.  We  decided 
to  sequence  and  assemble  full  Ad  14  genome  sequence 
derived  from  Ad  14  isolated  from  various  military  basic 
training  camps  including  Ad  14  that  caused  either  severe  or 
mild  ARD  infections.  This  would  allow  scientists  to 
understand  molecular  epidemiology  and  possible  molecular 
pathogeneses  of  the  recent  Ad  14s  infections  of  US  origin  as 
compared  to  the  prototype  Adl4p  de  Wit  infections  in  1950s. 
As  described  in  Materials  &  Methods,  multiple  Ad  14  PCR 
primer  pairs  (derived  from  Adl4  deWit  prototype  sequence, 
GenBank  Accession  AY803294)  were  used  to  generate 
overlapped  1-2  kilobase  (kb)  amplicons  to  cover  the  entire 
genomes  of  the  interested  Ad  14  strains.  All  PCR  products 
were  sequenced  in  both  directions  by  using  forward  or 
reverse  PCR  primers  corresponding  to  each  individual  PCR 
product.  All  clean  and  verified  readable  sequences  were  used 
to  assemble  full  Adl4  genome  sequence  via  using  Sequencer 
program  (Gene  Codes  Co.,  Ann  Arbor,  MI).  In  order  to 
finish  any  particular  Ad  14  strain,  it  regularly  took  more  than 


two  months  of  efforts  after  testing  more  than  100s  Ad  14 
specific  primer  pairs  through  PCR  amplifications  followed  by 
Sanger  DNA  sequencing  of  PCR  products.  It’s  not 
straightforward  in  sequencing  high  G/C  content  regions  of 
human  Ad  14s  that  cause  failures  in  PCR  amplifications  or 
Sanger  BigDye  terminator  reactions.  Thus,  the  actual  time  of 
completing  full  Adl4  genome  becomes  unpredictable,  i.e., 
greater  than  two  months  in  searching  for  workable 
PCR/sequencing  primers  to  fill  the  sequencing  gaps. 

Assembling  various  full  Adl4  genomic  sequences  of  2006- 
7  US  outbreaks  via  “Next-Generation”  pyrosequencing 
sequencing.  We  illustrated  and  confirmed  the  utilities  of 
“Next  Generation”  Pyrosequencing  could  be  used  to  generate 
massive  quantities,  greater  than  15  Mbs  human  Ad  14 
sequences  per  virus  from  each  DNA  sequencing  experiment. 
Up  to  8  different  viruses  could  be  fully  sequenced  per 
pyrosequencing  run.  It  was  proven  that  15  Mbs  sequencing 
data  per  virus  provide  ample  depth  of  full  genome  coverage, 
i.e.,  up  to  4-500  times  coverage  for  the  entire  34,768  bps 
Adl4  genomes  of  Lackland  origin,  such  as,  Ad  14  LL1986T 
and  LL303600. 

Accurate  &  Compatible/Identical  full  Adl4  genomic 
sequences  of  2006-7  US  outbreaks  derived  from 
conventional  Sanger  DNA  sequencing  and  “Next- 
Generation”  pyrosequencing  sequencing.  In  this  study,  we 
have  adapted  both  the  conventional  Sanger  sequencing 
methodology  as  well  as  “Next  Generation”  Pyrosequencing 


technique  to  assemble  compatible/identical  full  genome 
sequences  of  human  adenoviruses,  Adl4s  of  2006-7 
outbreaks.  We  illustrated  that  “Next-Generation” 
pyrosequencing  sequencing  technology  can  be  used  to 
replace  labor  intensive  Sanger  DNA  sequencing  method  to 
generate  accurate  full  Ad  14  genome  sequences  as  compared 
to  the  reference  Sanger  DNA  sequencing  Ad  14  sequences. 
Most  of  all,  Next-Generation”  pyrosequencing  sequencing 
offers  tremendous  time  saving,  i.e.,  multiple  Ad  14s  up  to  8 
different  strains  could  be  sequenced  and  assembled  in  less 
than  5  working  days.  In  additional  to  Adl4s,  the  Roche  454 
FLX  system  was  used  to  sequence  and  assemble  another 
closely  related  Adi  la  isolates  causing  non-US  ARD 
infections  (mostly  in  Southeastern  Asia)  since  the  1970s.  It 
was  shown  that  the  US  Ad  14a  strain  significantly  diverged 
from  the  prototypical  Eurasian  strain,  Adl4p,  and  shares 
greater  than  98%  genomic  homology  with  Adi  la.  Two 
genome  types  of  Adll,  Adllp  and  Adi  la  display  different 
tissue  tropisms,  causing  renal  and  upper  respiratory  infections 
respectively.  Adl4a  and  Adi  la  share  almost  identical  Fiber 
genes,  which  are  known  to  be  responsible  for  the 
adenoviruses'  organ  tropism,  and  both  cause  ARD  infections. 
Both  also  share  highly  homologous  Hexon  genes,  except  for  a 
400  base  pair  (bps)  region  that  allows  these  two  viruses  to  be 
distinctly  differentiated  from  each  other  based  on  serological 
cross  reactivity.  The  origin  of  the  emergent  Adl4a  could  be 
related  to  recombination  events  that  have  shuffled  the  tissue 
tropism  and  antigen  loci  of  ancestral  Adll  and  Adl4  strains. 
High  throughput  sequencing  is  a  powerful  tool  for  rapid 
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analysis  of  emerging  pathogens,  and  can  be  used  to  generate 
comparative  data  offering  information  regarding  the  genome¬ 
wide  relationship  of  those  pathogens  with  well-characterized 
relatives.  This  lately  developed  Next-Generation” 
pyrosequencing  technology  will  be  an  invaluable  tool  to 
quickly  uncover  and  study  potential  future  emergent 
infectious  diseases  by  assembling  full  and  accurate  pathogen 
genomes  without  any  previously  known  literature  or 
reference  sequences  available. 
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